ARC AGI 2 AI News List

Time	Details
2026-06-24 18:55	Gemini 3 Pro tops ARC‑AGI‑2 with 31% According to Ethan Mollick, Gemini 3 Pro first hit 31% on ARC-AGI-2 in Nov 2025, with an 8–12 month lead over open models like GLM 5.2 at 22.8%. Source
2026-03-02 23:53	ARC-AGI-2 Results: Chinese Open-Weight Models Underperform Frontier LLMs — Data-Backed Analysis According to ARC Prize on X, semi-private ARC-AGI-2 results show Kimi K2.5 scored 12% at $0.28, Minimax M2.5 5% at $0.17, GLM-5 5% at $0.27, and DeepSeek V3.2 4% at $0.12, all below July 2025 frontier lab models (as referenced by ARC Prize) (source: ARC Prize; post amplified by Ethan Mollick). According to ARC Prize, these outcomes indicate current Chinese open-weight models are strong in narrow tasks but weaker on generalization and out-of-distribution reasoning versus leading closed models, highlighting a performance gap with direct business impact on reliability-critical use cases like autonomous agents and complex tool-use pipelines. As reported by ARC Prize, the cost-performance figures suggest competitive token pricing but insufficient reasoning yield, guiding enterprises to consider hybrid stacks—using frontier closed models for hardest reasoning while deploying open-weight models for domain-specific, cost-sensitive workflows. Source
2026-02-19 16:21	Gemini 3.1 Pro Launch: Latest Benchmark Breakthrough with 77.1% ARC‑AGI‑2 Score — 2026 Analysis According to Demis Hassabis on X, Google DeepMind launched Gemini 3.1 Pro with major gains in core reasoning and problem solving, scoring 77.1% on the ARC-AGI-2 benchmark, more than double Gemini 3 Pro’s performance; the model is rolling out in Gemini App and Antigravity today (source: @demishassabis). As reported by Hassabis, these improvements signal stronger generalization and few-shot capabilities, which can translate into higher accuracy for enterprise agents, code assistants, and automated analytics workflows. According to the announcement, immediate availability in product surfaces enables faster A/B testing, developer adoption, and monetization for partners integrating Gemini 3.1 Pro via app ecosystems. Source

2026-06-24
18:55

According to Ethan Mollick, Gemini 3 Pro first hit 31% on ARC-AGI-2 in Nov 2025, with an 8–12 month lead over open models like GLM 5.2 at 22.8%.

Source

2026-03-02
23:53

ARC-AGI-2 Results: Chinese Open-Weight Models Underperform Frontier LLMs — Data-Backed Analysis

According to ARC Prize on X, semi-private ARC-AGI-2 results show Kimi K2.5 scored 12% at $0.28, Minimax M2.5 5% at $0.17, GLM-5 5% at $0.27, and DeepSeek V3.2 4% at $0.12, all below July 2025 frontier lab models (as referenced by ARC Prize) (source: ARC Prize; post amplified by Ethan Mollick). According to ARC Prize, these outcomes indicate current Chinese open-weight models are strong in narrow tasks but weaker on generalization and out-of-distribution reasoning versus leading closed models, highlighting a performance gap with direct business impact on reliability-critical use cases like autonomous agents and complex tool-use pipelines. As reported by ARC Prize, the cost-performance figures suggest competitive token pricing but insufficient reasoning yield, guiding enterprises to consider hybrid stacks—using frontier closed models for hardest reasoning while deploying open-weight models for domain-specific, cost-sensitive workflows.

Source

2026-02-19
16:21

Gemini 3.1 Pro Launch: Latest Benchmark Breakthrough with 77.1% ARC‑AGI‑2 Score — 2026 Analysis

According to Demis Hassabis on X, Google DeepMind launched Gemini 3.1 Pro with major gains in core reasoning and problem solving, scoring 77.1% on the ARC-AGI-2 benchmark, more than double Gemini 3 Pro’s performance; the model is rolling out in Gemini App and Antigravity today (source: @demishassabis). As reported by Hassabis, these improvements signal stronger generalization and few-shot capabilities, which can translate into higher accuracy for enterprise agents, code assistants, and automated analytics workflows. According to the announcement, immediate availability in product surfaces enables faster A/B testing, developer adoption, and monetization for partners integrating Gemini 3.1 Pro via app ecosystems.

Source

List of AI News about ARC AGI 2